To solve problems where "…the number of things you don’t know is one of the things you don’t know!"
April 28, 2015
To solve problems where "…the number of things you don’t know is one of the things you don’t know!"
The detailed balance condition states for all Borel sets \(A, B \subset \mathcal{X}\):
\[ \int_{(x, x') \in A \times B} \pi(dx) P(x, dx') = \int_{(x, x') \in A \times B} \pi(dx') P(x', dx) \]
Scenario:
If the current state of the chain is \((k, \boldsymbol \theta_k)\), then:
Accept the proposed move to \((k^*, \boldsymbol \theta_{k^*}^*)\) with probability
\[ \alpha = \min\left\{1, \frac{\pi( \theta_{k^*}^*) j(k^*| k) g'(\boldsymbol u^*)}{\pi(\boldsymbol \theta_k) j(k|k^*) g(\boldsymbol u)} \left|\frac{\partial h'( \theta_{k^*}^*, \boldsymbol u^*)}{\partial (\boldsymbol \theta_k, \boldsymbol u)}\right|\right\} \]
Where \(u^* \sim g'\) (Chen, Ibrahim, and Shao 2000, 303)
Acceptance probability becomes
\[ \alpha = \min\left\{1, \frac{p(k^*)\pi( \theta_{k^*}^*|D,\mathcal{M}_{k^*}) j(k^*| k) g'(\boldsymbol u^*)}{p(k)\pi(\boldsymbol \theta_k |D,\mathcal{M}_k ) j(k|k^*) g(\boldsymbol u)} \left|\frac{\partial h'( \theta_{k^*}^*, \boldsymbol u^*)}{\partial (\boldsymbol \theta_k, \boldsymbol u)}\right|\right\} \]
Posterior model probability is \(\hat{p}(k | D) = \frac{1}{L} \sum\limits_{l = 1}^L \mathcal{I}(k_l = k)\) where \(L\) is the length of the MCMC chain and \(k_l\) is the model chosen at iteration \(l\).
If count data are overdispersed relative to the Poisson distribution, the Negative-Binomial distribution may be more appropriate.
Under an identically distributed Poisson model, one parameter \(\lambda > 0\)
\[ \mathcal{L}(Y|\lambda) = \prod_{i=1}^N \dfrac{\lambda^{Y_i}}{Y_i!}e^{-\lambda} \]
Under an identically distributed Negative-Binomial model, two parameters \(\lambda > 0\), \(\kappa > 0\)
\[ \mathcal{L}(Y|\lambda, \kappa) = \prod_{i=1}^N \dfrac{\lambda^{Y_i}}{Y_i!} \dfrac{\Gamma \left(1/\kappa + Y_i\right)}{\Gamma \left(1/\kappa\right)\left(1/\kappa + \lambda\right)^{Y_i}}\left(1+\kappa \lambda\right)^{-1/\kappa} \]
Consider the model selection problem with \(\{\mathcal{M}_k: k = 1,2\}\)
Under Model \(k = 1\), \(\theta_1 = \lambda\)
Under Model \(k = 2\), \(\theta_2 = (\lambda, \kappa)\)
Posterior Distribution
\[ \pi(k, \theta_k| Y) \propto \begin{cases} \frac{1}{2} \mathcal{L}(Y|\lambda) \text{Gamma}(\alpha_{\lambda}, \beta_{\lambda}) & k=1\\ \frac{1}{2} \mathcal{L}(Y|\lambda,\kappa) \text{Gamma}(\alpha_{\lambda}, \beta_{\lambda}) \text{Gamma}(\alpha_{\kappa}, \beta_{\kappa}) & k=2\\ \end{cases} \]
Since the two models are of differing dimensions, need reversible jump to move between models.
Move from Model 1 to Model 2
Let \(x = (1, \theta_1)\). Set \(x' = (2, \theta_2')\), where
\[ \theta_2' = (\theta_{2,1}', \theta_{2,2}') = h(\theta_1, u) = (\theta_1, \mu e^u) \]
for fixed \(\mu\). Now \(\kappa = \mu e^u\).
Move from Model 2 to Model 1
Then \((x,u)\) has dimension 2 + 1 = 3, which matches dimension of \((x',u')\) = 3 + 0 = 3.
Let \(m=\left\{(1,2), (2,1)\right\}\).
Model 1 to Model 2
\[ |J| = \left\vert \begin{array}{cc} 1 & 0\\ 0 & \mu e^u\\ \end{array}\right\vert = \mu e^u \]
Model 2 to Model 1
\[ |J| = \left\vert \begin{array}{cc} 1 & 0\\ 0 & (\mu/\theta_{2,2}')\mu\\ \end{array}\right\vert = 1/\theta_{2,2}' \]
Move from Model 1 to Model 2
\(\alpha_{1,2}(x, x') = \text{min}\{1, A_{1,2}\}\) where \[ A_{1,2} = \dfrac{\pi(2, \theta_2')}{\pi(1, \theta_1)} \left\{\dfrac{1}{\sqrt{2\pi\sigma^2}} \text{exp} \left\lbrack \dfrac{-u^2}{2\sigma^2} \right\rbrack \right\}^{-1} \mu e^u \]
Move from Model 2 to Model 1
\(\alpha_{2,1}(x', x) = \text{min}\{1, A_{2,1}\}\) where \[ A_{2,1} = \dfrac{\pi(1, \theta_1)}{\pi(2, \theta_2')} \dfrac{1}{\sqrt{2\pi\sigma^2}} \text{exp} \left\lbrack \dfrac{-log(\theta_{2,2}'/\mu)^2}{2\sigma^2} \right\rbrack \frac{1}{\theta_{2,2}'} \]
Implementing reversible algorithms may seem difficult for several reasons.
These issues are not the cause of difficulty though
The main issue is usually not whether a proposal mechanism will work, but whether it will work effeciently.
The tuning for a specific problem can be arduous, which has lead to work on finding useful general techniques for selecting parts of the mechanism.
Improving efficiency requires the proposed state \((k', \theta'_{k'})\) and the existitng state \((k, \theta_k)\) have similar state spaces.
There are two main classes of methods for ensuring this:
The order of the method determines the type of constraint imposed:
Adaptive sampling: using past observations (even rejected ones) to make mid-run adjustments to the proposal
Delayed rejection: if proposal \(x'\) is rejected, try a backup proposal \(x''\)